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A Method for Selecting Between Linear and Quadratic Classification Models 

in Discriminant Analysis 

Alice Meshbane and John D. Morris 
Florida Atlantic University 

A method for comparing the cross validated classification accuracies of linear 
and quadratic classification rules is presented under varying data conditions for the k- 
group classification problem. The classification rules are based on a Bayesian 
conditional-probability model assuming multivariate normality within each criterion 
population . Defining 

A* 2 - [(x, - x^sr'cx, - £)] 

to be the square of the distance from the point in p-space representing individual / 
(i.e., X<) to the point representing the means of the p measures in group k (i.e., X*), 
where S k is the sample (p x p) covariance matrix for group k, the following 
"quadratic" classification statistic is used: 

Pt I S k | expMAD^ 

K 

E p k . | S k . | * exp(-ViDV) 

where p k is the prior probability of membership in population k. This latter 
expression represents the (posterior) probability of individual i belonging to 
population k. An individual is classified into that population from which the sample 
yields the largest value of In this study, equal prior probabilities are used (that 

4 

ERJC 



2 

is, p k = 1/K) because it is not known whether the sample group sizes represent the 
proportions found in the population. The linear classification rule used is based on 
values determined as above except that the S k matrices are replaced by S, the pooled 
sample (p x p) covariance matrix. 

Theoretically, a quadratic classification rule should lead to higher cross 
validation classification hit rate accuracy than a linear classification rule when group 
covariance structures are different (Anderson, 1984, p. 235). However, Huberty and 
Curry (1978) found that a linear classification rule performed nearly as well as, or 
superior to, a quadratic rule in seven situations (the combined conditions of equal and 
unequal covariance matrices, and two and three criterion groups, for three sets of real 
data, using n^N as the value for p^. The authors point out that fewer parameters 
need to be estimated with a linear rule (a pooled S matrix is used instead of separate 
groups S k matrices), and thus greater across-sarnple stability might be expected 
(Michaelis, 1973, p. 230). Also, "the assumption of normality seems to be more 
critical for quadratic rules than linear rules" (Johnson & Wichern, 1992, p. 540). 

Purpose 

This study extends the finGi/igs of Huberty and Curry by offering a method for 
determining the superior classification rule for a specific data se^ sgardless of 
covariance structure. In addition, a computer program that accomplishes the method 
is introduced and demonstrated. 
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Method 

The data 

Thirty three classification data sets varying in number of subjects, predictor 
variables, groups (two or three), and heterogeneity of covariance structure were 
employed to illustrate the method. To bolster validity, all data sets were taken from 
real classification studies. The sources were journal articles, paper presentations and 
research texts. No pathological distributional problems are known in any of the data 
sets; it is expected that they are much as one would find in typical classification 
studies. 
Procedure 

In comparing the predictive accuracy of the linear rule to that of the quadratic 
rule, "external" rather than "internal" results were considered. Results of an internal 
classification analysis are those obtained when measures for the individuals on whom 
the statistics were based are resubstituted to obtain the values. In an external 
classification analysis statistics based on one set of individuals are used in classifying 
"new" individuals. An external analysis is appropriate for making inferences about 
the discriminatory power of the predictors for a new set of data (Huberty, 1984). 

External, or cross validated, hit-rate accuracy was estimated using the "leave- 
one-out" procedure. A subject is classified by applying the rule derived from all Ss 
except the one being classified. This process is repeated "round-robin" for each 
subject with a count of the overall classification accuracy used to estimate the cross 
validated accuracy. This procedure has a relatively wide following in the discriminant 
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analysis literature (see, for example, Huberty, 1984; Huberty & Mourad, 1980; 
Lachenbruch, 1967; Mosteller & Tukey, 1968). 

Separate group as well as total group proportions of correct classifications 
were compared for the linear and quadratic rules. McNemar's (1947) test for 
contrasting correlated proportions was used in the statistical comparisons between 
linear and quadratic models for the separate group and total sample proportions. This 
method was previously suggested for comparing full and reduced classification 
methods (Morris & Huberty, 1991), but is equally applicable in comparing linear and 
quadratic models. [See Looney (1988) for a method of comparing classification 
results of more than two models.] As the calculation of the McNemar correlated 
proportion statistic requires the joint distribution of "hits" and "misses" for both the 
linear and quadratic classification rule, no statistical package will accomplish the 
method. Therefore, a FORTRAN computer program was written to provide this 
information. 

The Box test was used for testing the assumption of homogeneity of covariance 
structures. Not withstanding concerns over this test, one could argue that, 
theoretically, a quadratic classification rule is appropriate when the Box test indicates 
that the covariance structures are unequal. 

Results and Discussion 

For each of the data sets, Table 1 gives a short description, -e number of 
subjects (AO, the number of predictor variables (p), results of the Box test for 
homogeneity of covariance structures, the appropriate classification rule (quadratic 
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when the Box test suggests the assumption of equal covariance matrices is untenable), 
and a comparison of the performance of the linear and quadratic rules for each group 
separately and for the total sample. Performance of the two classification rules, 
displayed as the hit rate percent obtained by the p predictor variables, was compared 
via McNemar's test for contrasting correlated proportions. 

As can be seen in Table 1 , differences between the linear and quadratic rules 
in classifying the total sample were not statistically significant (z < 2,58, p > .01, 
two-tailed test), with the exception of data set 2. Here, the linear rule was judged 
appropriate by the Box test and yielded a significantly higher total hit rate. 
Differences between the two classification rules in separate group hit rates were 
statistically significant in nine of the 33 data sets: the linear rule outperformed the 
quadratic rule in data sets 3 and 4, where the linear rule was judged appropriate by 
the Box test, and in data sets 5, 7, 11, 13, 14, and 17, where the quadratic rule was 
judged appropriate by the Box test; the quadratic rule outperformed the linear rule in 
five situations where the Box test indicated the quadratic rule was appropriate (data 
sets 5, 11, 14, 16, and 17), but in no situation where the Box test indicated the linear 
rule was appropriate. These results are summarized in Table 2. 

Although some researchers have urged caution in using anything but equal 
prior probabilities of group membership for classification (e.g., Lindeman, Merenda, 
& Gold, 1980, pp. 211-212), data sets with unequal numbers of subjects per group 
(data sets 1-4, 6, and 22-33) were tested using prior probabilities of nJN for the 
purpose of replicating Huberty & Curry's (1978) study. The results were identical to 
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the findings reported above for equal priors, with the following trivial exceptions: (1) 
in data sets 2, 3, and 4, the difference between <he two models was no longer 
statistically significant; (2) in data set 3, the separate group hit rate for the quadratic 
rule was significantly higher than for the linear rule. 

These results extend the findings of Huberty & Curry (1978) to a broader 
range of data sets. More important, however, is that a method is now available for 
comparing the performance of the two rules. The method will be helpful in 
determining when to use a quadratic rather than a linear classification rule to 
maximize classification accuracy for a specific data set in a predictive discriminant 
analysis. 

If you would like a copy of the FORTRAN program that accomplishes the 
method, just send a returnable 5 1/4" or 3 1/2" diskette and diskette mailer to: 
John D. Morris 
Florida Atlantic University 

Department of Educational Foundations and Technology 

College of Education 

P.O. Box 3091 

Boca Raton, FL 33431-0991 
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Table 2 



Summary of Linear vs. Quadratic Classification Model Superiority by Condition 
(Equal or Unequal Covariance Matrices) 



Equality of 




# of Data Sets in which 


# of Data Sets in which 


Covariance 


Appropriate 


Linear Model Hit Rate 


Quadratic Model Hit Rate 


Matrices 


Rule Based 


was Superior* 




was Superior* 


Based on 


on Box M 








Box M Test 


Test 


Separate Group 


Total 


Separate Group Total 


Equal 


Linear 


2 


i 


0 0 




(11 data sets) 








Unequal 


Quadratic 


6 


0 


5 0 



(22 data sets) 



* z > 2.58,/? < .01 



